Genome sequencing and functional analysis of a multipurpose medicinal herb Tinospora cordifolia (Giloy)

Tinospora cordifolia (Willd.) Hook.f. & Thomson, also known as Giloy, is among the most important medicinal plants that have numerous therapeutic applications in human health due to the production of a diverse array of secondary metabolites. To gain genomic insights into the medicinal properties of T. cordifolia, the genome sequencing was carried out using 10× Genomics linked read and Nanopore long-read technologies. The draft genome assembly of T. cordifolia was comprised of 1.01 Gbp, which is the genome sequenced from the plant family Menispermaceae. We also performed the genome size estimation for T. cordifolia, which was found to be 1.13 Gbp. The deep sequencing of transcriptome from the leaf tissue was also performed. The genome and transcriptome assemblies were used to construct the gene set, resulting in 17,245 coding gene sequences. Further, the phylogenetic position of T. cordifolia was also positioned as basal eudicot by constructing a genome-wide phylogenetic tree using multiple species. Further, a comprehensive comparative evolutionary analysis of gene families contraction/expansion and multiple signatures of adaptive evolution was performed. The genes involved in benzyl iso-quinoline alkaloid, terpenoid, lignin and flavonoid biosynthesis pathways were found with signatures of adaptive evolution. These evolutionary adaptations in genes provide genomic insights into the presence of diverse medicinal properties of this plant. The genes involved in the common symbiosis signalling pathway associated with endosymbiosis (Arbuscular Mycorrhiza) were found to be adaptively evolved. The genes involved in adventitious root formation, peroxisome biogenesis, biosynthesis of phytohormones, and tolerance against abiotic and biotic stresses were also found to be adaptively evolved in T. cordifolia.


Genome sequencing and functional analysis of a multipurpose medicinal herb Tinospora cordifolia (Giloy)
Shruti Mahajan , Abhisek Chakraborty , Manohar S. Bisht , Titas Sil & Vineet K. Sharma * Tinospora cordifolia (Willd.)Hook.f.& Thomson, also known as Giloy, is among the most important medicinal plants that have numerous therapeutic applications in human health due to the production of a diverse array of secondary metabolites.To gain genomic insights into the medicinal properties of T. cordifolia, the genome sequencing was carried out using 10× Genomics linked read and Nanopore long-read technologies.The draft genome assembly of T. cordifolia was comprised of 1.01 Gbp, which is the genome sequenced from the plant family Menispermaceae.We also performed the genome size estimation for T. cordifolia, which was found to be 1.13 Gbp.The deep sequencing of transcriptome from the leaf tissue was also performed.The genome and transcriptome assemblies were used to construct the gene set, resulting in 17,245 coding gene sequences.Further, the phylogenetic position of T. cordifolia was also positioned as basal eudicot by constructing a genome-wide phylogenetic tree using multiple species.Further, a comprehensive comparative evolutionary analysis of gene families contraction/expansion and multiple signatures of adaptive evolution was performed.The genes involved in benzyl iso-quinoline alkaloid, terpenoid, lignin and flavonoid biosynthesis pathways were found with signatures of adaptive evolution.These evolutionary adaptations in genes provide genomic insights into the presence of diverse medicinal properties of this plant.The genes involved in the common symbiosis signalling pathway associated with endosymbiosis (Arbuscular Mycorrhiza) were found to be adaptively evolved.The genes involved in adventitious root formation, peroxisome biogenesis, biosynthesis of phytohormones, and tolerance against abiotic and biotic stresses were also found to be adaptively evolved in T. cordifolia.Tinospora cordifolia is a climbing shrub belonging to the Menispermaceae family that includes more than 400 plant species 1 .The present scientific name of this plant is Tinospora cordifolia (Willd.)Hook.f.& Thomson according to the 'USDA-ARS GRIN Taxonomy' , whereas its scientific name as per the 'International Plant Names Index' is Tinospora cordifolia.Thus, we have referred to it as Tinospora cordifolia in the subsequent text.This plant has gained tremendous therapeutic interest and significance during the COVID-19 pandemic due to its enormous therapeutic potential 2 .It perhaps originated in Africa in the Oligocene epoch (28.57 million years ago) and was spread to Asia in the early Miocene epoch (21.54 million years ago) 3 .It is found in tropical and

Phylogenetic analysis
Among the identified 152,176 orthogroups, 224 fuzzy one-to-one orthogroups were predicted across 29 selected plant species.The filtered 224 fuzzy one-to-one orthogroups contained 201,201 alignment positions and were used to construct a phylogenetic tree.The position of T. cordifolia in the phylogenetic tree was found close to Papaver somniferum with estimated divergence time of ~ 122 MYA (95% CI), which also concurs with the species divergence time obtained from the TimeTree database 43 .T. cordifolia and P. somniferum form a separate clade from all the core eudicots and monocot species because the clade of T. cordifolia is basal eudicots that diverged very early from the core eudicots (145 MYA with 95% CI) 44,45 (Fig. 2 and Supplementary Figure S1).

Gene family contraction and expansion analysis
The CAFÉ analysis resulted in 5169 contracted and 782 expanded gene families in T. cordifolia.Among these gene families, 12 were highly contracted, and 25 were highly expanded in T. cordifolia.The highly contracted genes families were involved in phenylpropanoid biosynthesis, mitochondrial biogenesis, protein processing in endoplasmic reticulum, MAPK signalling pathway, and plant-pathogen interaction (Supplementary Table S9).
The highly expanded gene families were majorly involved in phenylpropanoid biosynthesis, zeatin biosynthesis, carbohydrate metabolism, xenobiotic degradation, membrane trafficking, and MAPK signalling pathway (Supplementary Table S10).

Orthologous gene clustering
The phylogenetic tree constructed using the peptide sequences of six Ranunculales species showed that T. cordifolia is closer to Ranunculaceae members i.e., T. thalictroides and C. chinensis compared to the other considered www.nature.com/scientificreports/species (Fig. 3a).Among all the orthologous gene set, 66.9% (7321) gene set were shared across all six species, indicating their conservation.2.7% (304) gene set were specific to T. cordifolia (Fig. 3b).The GO functions of these species-specific genes are mentioned in Fig. 3c,d, and e.

Genes with signatures of adaptive evolution
The comparative evolutionary analysis of the six selected species of order Ranunculales revealed 724 genes with high rate of evolution, 3432 genes with unique amino acid substitutions with functional impact and 3566 genes with positive selection in T. cordifolia (Supplementary Table S11 and Supplementary Table S12).Among these genes, 290 showed all three signatures of adaptive evolution in T. cordifolia (Supplementary Figure S2 and Supplementary Table S13).

Benzyl iso-quinoline alkaloid (BIA) biosynthesis pathway in T. cordifolia
Alkaloids are the main bioactive compounds of Menispermaceae and Ranunculales plants.The genes involved in the BIA biosynthesis pathway were observed in T. cordifolia with transcriptomic evidences (Fig. 5, Supplementary Figure S3, and Supplementary Table S14).BIAs such as berberine and magnoflorine were also reported to have pharmaceutical activities against the COVID-19 virus 14,23 .Along with these, palmatine, tembetarine, and jatrorrhizine are the other pharmaceutically active biomolecules in T. cordifolia 7 .Among the genes involved in the biosynthesis of BIAs in T. cordifolia, two genes, Tyrosine decarboxylase (TYDC) and Canadine synthase (CAS), showed multiple signatures of adaptive evolution.Additionally, tyrosine aminotransferase (TAT ) showed unique amino acid substitutions in T. cordifolia.The analysis of orthologous genes found seven genes (4OMT, CNMT, CoOMT, RNMT, CST, NOMT, and SOMT) with bacterial orthologs and three genes (CNMT, RNMT, and TYDC) with fungal orthologs (Supplementary Figures S4-18).CAS gene had orthologs from bryophytes and gymnosperms.RNMT had the highest number of bacterial and fungal orthologs among all the other BIA biosynthesis genes.Further, all genes except CST had orthologs from monocots.

Other secondary metabolites biosynthesis pathway
Apart from BIA biosynthesis pathway, genes involved in terpenoid backbone biosynthesis, lignin biosynthesis, and flavonoid biosynthesis were found in T. cordifolia genome with transcriptomic evidences (Supplementary Figure S3).Among the 19 genes involved in terpenoid backbone biosynthesis, eight genes showed at least one signature of adaptive evolution.Out of the eight genes, DXS, AACT and FPPS showed MSA, PMK, and FOLK showed positive selection, ispE and GGPS1 showed unique amino acid substitutions, and ispH showed high nucleotide divergence (Fig. 5, Supplementary Table S15, and Supplementary Figure S3).Genes PAL, C4H, and 4CL, involved in both flavonoid and lignin biosynthesis pathways, showed signatures of adaptive evolution in T. cordifolia.Along with these three genes, five other genes of flavonoid biosynthesis showed at least one signature of adaptive evolution.Among these eight genes, PAL, 4CL, DFR, and ANR showed multiple signatures of adaptive evolution.UFGT showed only positive selection, and C4H, F3H and FLS showed only unique amino acid substitutions.Among the 13 genes involved in lignin biosynthesis, eight genes showed at least one signature of adaptive evolution.PAL, 4CL, C3H, CCR , and CAD showed multiple signatures of

Adventitious root associated genes in T. cordifolia
Adventitious roots are aerial roots that emerge from the plant shoot system and function as storage, support, etc. 48.T. cordifolia, being a climbing plant, bears assimilatory roots (a type of adventitious roots) that help plants anchorage on other trees or structures, and perform photosynthesis 49,50 .25 genes associated with adventitious root development in plants showed signatures of adaptive evolution in T. cordifolia (Supplementary Table S17).Among these 25 genes, a few genes also showed evidence of their gene expression in T. cordifolia (Supplementary Figure S3).Out of the 25 genes, 17 genes showed MSA, and eight showed only one signature of adaptive evolution.Plant growth, phytohormones and stress tolerance associated genes in T. cordifolia 76 genes with all three signatures of adaptive evolution were associated with plant growth and development involved in leaf development, root development, fruit ripening, tissue development, pollen germination, flowering regulation, seed development, cell wall modification, etc.Among these, ten genes were associated with root development in T. cordifolia.28 genes that were associated with phytohormones such as auxin, gibberellin, abscisic acid, ethylene, jasmonic acid, salicylic acid, cytokinin, and brassinosteroids showed all three signatures of adaptative evolution.Among these, nine genes were associated with auxin, and eight genes were associated with abscisic acid.Among plant stress tolerance-associated genes, 49 and 34 genes were involved in abiotic and biotic stress tolerance, respectively.These 49 abiotic stress tolerance genes were involved in light/heat tolerance, cold tolerance, salt tolerance, osmotic stress tolerance and oxidative stress regulation and tolerance (Supplementary Table S19).

Discussion
Tinospora cordifolia is known to produce several phytochemicals as secondary metabolites that are responsible for its various medicinal properties 11,52 .Being a medicinally important herb with therapeutic applications in multiple health conditions, the genome and transcriptome sequencing of T. cordifolia was much needed to gain genomic insights into the secondary metabolites production responsible for its medicinal properties.This study reports the draft genome assembly of T. cordifolia which is also the genome from the Tinospora genus and its family.This study also deliberates the genome assembly, transcriptome assembly, gene set, and phylogeny of T. cordifolia that will help in understanding the medicinal properties of this multipurpose herbal plant.The genome assembly was constructed using a hybrid approach that has emerged as a promising methodology to significantly increase the contiguity in the genome assembly by increasing scaffold N50 and decreasing the number of scaffolds compared to short-read technology.This hybrid approach helped in the sequencing and assembly of this complex genome of T. cordifolia with high heterozygosity and high repetitive content 53 .Further, this observed high heterozygosity of T. cordifolia could also be associated with its dioecious nature as reported in earlier studies [54][55][56] .
T. cordifolia belongs to the order Ranunculales and our phylogenetic tree finds its position as a distinct branch separate from all the considered core eudicots and the monocot (Fig. 2).This could be due to the early divergence of order Ranunculales from all the other core eudicots.Order Ranunculales is regarded as an early diverging eudicot order and is among a few other eudicot orders (collectively known as basal eudicots) that are found to be sister lineage to the core eudicots, which was also observed in the case of T. cordifolia that showed early divergence from all other dicot species 44,57,58 .According to the phylogenetic tree, T. cordifolia was found in a separate clade as basal eudicot with P. somniferum (122 MYA) diverged from core eudicots (145 MYA), which was also supported by other studies (Fig. 2) [59][60][61][62][63][64] .
Alkaloids, terpenoids, lignans, and flavonoids are the main secondary metabolites in T. cordifolia.Among the alkaloids, benzyl iso-quinoline alkaloid compounds like berberine, palmatine, magnoflorine, and tembetarine are the key bioactive compounds in T. cordifolia.Among the BIAs, berberine and magnoflorine have been reported to have immunomodulatory, antibacterial, and anti-viral activities 14,16,65 .Among a few tested compounds from T. cordifolia, berberine was reported to have the highest affinity against SARS-CoV-2 targets 66 .It also inhibited DNA and protein synthesis in bacteria 65 .Another BIA, tembetarine, possesses anti-diabetic and antibacterial activities, whereas columbamine has immunomodulatory activity 16,67 .Additionally, its terpenoids and flavonoids are reported to possess antidiabetic activity, whereas lignans are reported to have anti-oxidant activity in T. cordifolia 8,11 .These alkaloids, terpenoids, lignans, flavonoids along with various other phytochemicals, thus contribute to the diverse medicinal properties of T. cordifolia.Therefore, we traced the genes involved in the biosynthesis pathways of benzyl iso-quinoline alkaloids, terpenoids, lignans, and flavonoids in T. cordifolia, and found signatures of adaptive evolution along with transcriptomic evidences in a few of these genes (Figs. 4, 5 and Supplementary Figure S3).The genes with signatures of adaptive evolution involved in BIA (TAT , TYDC and CAS) and terpenoids (DXS, AACT and FPPS) are also reported in other plant species 64,[68][69][70][71]   have significance in other plant genomes [72][73][74][75] .The observed evolutionary signatures in genes involved in these pathways might contribute to the production of diverse secondary metabolites in T. cordifolia.
The presence of endophytic fungi in plants helps in the availability of essential mineral nutrients, which improve plant growth via phytohormone production, tolerance against stressed conditions like drought, salinity, heavy metals, high or low temperatures, etc. 33,76,77 .However, among the 12 conserved Arbuscular Mycorrhiza (AM) endosymbiosis-associated genes in plants, only one gene CYCLOPS is found to be absent in T. cordifolia, which was similar to other AM endosymbiosis-associated plant species such as Fragaria x ananassa, Castanea mollissima and Quercus robur 47 .Most genes mentioned above are directly involved in CSSP, which is responsible for AM endosymbiosis in plants.The genes involved in this pathway showed signatures of adaptive evolution in T. cordifolia, which indicates towards the evolution of AM symbiosis (Fig. 6 and Supplementary Table S16).These genes with evolutionary signatures were also reported in other plant species and associated with endosymbiosis 78,79 .The AM endosymbiosis is also influenced by phytohormones like auxin, cytokinin, jasmonic acid, brassinosteroids, etc. 80,81 .Thus, the genes associated with these phytohormones also showed signatures of adaptive evolution which further suggests the evolution of AM symbiosis in T. cordifolia (Supplementary Table S13).Moreover, endophytic bacteria and fungi present in T. cordifolia are Bacillus spp., Aneurinibacillus spp., Pseudomonas spp., Penicillium spp., Nigrospora oryzae, Chaetomium globosum, Colletotrichum spp., Curvularia spp., Alternaria alternata, Phome spp., etc. 36,37 .These endophytes are known to produce secondary metabolites, and phytohormones like indole acetic acid and gibberellins that helps in the growth and stress tolerance of their host plants [82][83][84][85][86][87][88][89][90][91] .Thus, evolutionary signatures in genes associated with common symbiosis signalling pathway might contribute to the evolution of medicinal properties in T. cordifolia.T. cordifolia, being a climbing plant, remains in the shades of the supportive tree where lesser sunlight might reach the plant 49 .Thus, these plants increase their photosynthesis capacity and anchorage by forming assimilatory roots.These aerial roots perform photosynthesis and provide anchorage that contributes to plant growth 48 .The adventitious roots in T. cordifolia were found to be evolved due to the presence of signatures of adaptive evolution in these genes involved in adventitious root formation (Supplementary Table S17).These genes with evolutionary signatures were also reported in other plant species and play key roles in adventitious root formation [92][93][94] .Additionally, phytohormones also play an important role in regulating adventitious root formation in plants 95 .Along with the genes involved in adventitious root formation, phytohormone associated genes were also found to be evolved in T. cordifolia.Peroxisomes play an important role in photorespiration and plant growth that occurs in all C3 plants 51 .T. cordifolia, a C3 plant, performs photorespiration and lacks adaptations like the CAM pathway of CAM plants.Therefore, the signatures of adaptive evolution observed in genes associated with peroxisome biogenesis appear to be significant for C3 photosynthesis contributing to plant growth in T. cordifolia (Fig. 7 and Supplementary Table S18).These evolutionary signatures in genes associated with adventitious roots, phytohormones, and peroxisomes contribute towards the overall growth of T. cordifolia.Taken together, the comprehensive comparative evolutionary analysis helped to reveal the evolution in genes involved in common symbiosis signalling pathway, adventitious root formation, peroxisome associated, abiotic and biotic stress tolerance, and in genes associated with various phytohormones; thus, providing genomic clues on the diverse medicinal properties of this species.
The study reports the draft genome of T. cordifolia and provides important insights on the genomic basis of its medicinal properties due to the presence of diverse secondary metabolites.However, the presence of these metabolites also posed problems during the DNA and RNA extraction steps, and also severely reduced the data throughput from the ONT nanopore sequencing technologies.Thus, it is required to develop improved nucleic acid extraction protocols to deal with such medicinal plants.Further, the presence of high heterozygosity and repetitive content in T. cordifolia genome posed challenges during the assembly process and limited the assembly statistics such as the N50 score.The availability of more data from both second and third generation sequencing technologies is much needed to achieve a better coverage and a more contiguous assembly of this species.Further, due to the absence of genomes from the Tinospora genus and Menispermaceae family, species belonging to the same order were used for evolutionary analysis to eliminate the effect of large genetic distances.The availability of more genomes from closely related plant species would offer a deeper understanding of the species-specific or family-specific characteristics of T. cordifolia.Additionally, conducting experimental validation on the role of adaptively evolved genes in secondary metabolites biosynthesis will provide further support to the results of these analysis and will help in developing better approaches to benefit from the medicinal properties of this plant.

Conclusion
The insights obtained from this study and the availability of the genome sequence of T. cordifolia will help bridge the missing link between its genomic and medicinal properties and provide leads for exploring the genomic basis of these properties.It will also aid in various comparative genomic studies and act as a reference for future species sequenced from its genus and family.It will also help in the genome-wide phylogenetic assessments and evolutionary analyses of this species.The knowledge of mechanisms and pathways involved in producing its numerous medicinally important secondary metabolites will help better exploit these pathways and resultant metabolites for medicinal purposes and therapeutic applications.

Sample collection, species identification, nucleic acids extraction, and sequencing
The plant was brought from a nursery in Bhopal, Madhya Pradesh, India (23.2599°N, 77.4126° E).All methods were performed in accordance with the national guidelines concerning research involving plants.The leaves from the young plant were cleaned and used for DNA and RNA extractions.For DNA extraction, the leaves were homogenized in liquid nitrogen.For genomic DNA extraction, the powdered leaf was washed with 70% ethanol and distilled water to eliminate any such compounds that may hinder the extraction process, and employed CTAB-based lysis buffer for the isolation 96 .Two genes: one nuclear gene and one chloroplast gene (internal transcribed spacer and Maturase K, respectively), were used for the species identification.These genes were amplified and sequenced at an in-house Sanger sequencing facility.The total RNA was extracted with TRIzol reagent from the leaf sample using multiple reactions and the RNA was then pooled before proceeding for the library preparation 97 .The TruSeq Stranded Total RNA library preparation kit with Ribo-Zero Plant workflow (Illumina, Inc., USA) was deployed for preparing the transcriptomic library.The genomic library for linked reads was prepared using a Gel Bead kit and Chromium Genome library kit on a Chromium Controller instrument (10× Genomics, CA, USA).The quality of both the libraries (transcriptomic and genomic) was checked on Tapestation 4150 (Agilent, Santa Clara, CA) and sequenced on an Illumina platform, NovaSeq 6000 (Illumina, Inc., USA) for producing paired-end reads.For Nanopore sequencing, the genomic DNA libraries were prepared using SQK-LSK109 and 110, and sequenced on MinION sequencer.The comprehensive DNA and RNA extraction procedure is mentioned in the Supplementary Text.

Genome assembly
An array of Python scripts (https:// github.com/ ucdav is-bioin forma tics/ proc1 0xG) were used to remove the barcode sequences from the 10× Genomics linked reads.GenomeScope v2 was employed to calculate heterozygosity similar to other studies and SGA-preqc for genome size estimation of T. cordifolia using the k-mer count distribution method [98][99][100] .To reduce the impact of sequencing errors on the estimation, only the k-mers with higher occurrences are considered in this k-mer count distribution method.Initially, the sga preprocess was run www.nature.com/scientificreports/with the option "-pe-mode 1" to consider all the paired-end barcode-filtered 10× Genomics reads.Following this step, the pre-processed reads were indexed by sga index with 'ropebwt' algorithm and "-no-reverse" options.Finally, the SGA-preqc was run with default options for the genome size estimation.The de novo assembly was generated by Supernova assembler v.2.1.1 (with maxreads = all options and other default settings) using 499.36 million raw reads 101 .The 'pseudohap2' style in Supernova mkoutput was implemented to assemble the haplotype-phased genome.The barcodes of linked reads were processed using Longranger basic v2.2.2 (https:// suppo rt.10xge nomics.com/ genome-exome/ softw are/ pipel ines/ latest/ insta llati on) and these processed reads were used by Tigmint v1.2.1 to rectify the misassemblies present in Supernova assembled genome 102 .AGOUTI v0.3.3 with qualityfiltered transcriptome reads was used to accomplish the initial scaffolding 103 .To construct a more contiguous assembly, ARCS v1.1.1 with the barcode-processed linked reads was used to provide additional scaffolding and enhance the contiguity of the genome assembly 104 .LINKS v1.8.6 was also used with the adapter-filtered Nanopore reads (using Porechop) for additional scaffolding 105 .LR_Gapcloser used Nanopore reads to perform gap-closing of the assembly 106 .Using a bloom filter-based method and k-mer value ranging from 30 to 120 with a ten bp interval, Sealer v2.1.5used the linked reads (barcode processed) for gap-closing in the assembly 107 .Performing scaffolding multiple times could give rise to local misassemblies, small indels, or distinct base errors that were overwhelmed using Pilon v1.23 which utilized the linked reads (barcode-processed) to increase the assembly quality 108 .The obtained assembly was length based filtered such that scaffolds with length ≥ 2000 bp were retained.The completeness of genome assembly was evaluated with BUSCO v5.4.4 which used viridiplan-tae_odb10 database for the assessment 109 .

Transcriptome assembly
The de novo transcriptome assembly was carried out using RNA-Seq data generated in this study.Trimmomatic v.0.38 was used to process raw data reads, i.e., adapter removal and quality filtration 110 .The de novo transcriptome assembly was constructed using Trinity v2.9.1 with a strand-specific option and other default parameters using the processed paired-end reads 111 .A Perl script offered in Trinity software package was utilized to evaluate the assembly statistics.

Genome annotation
The genome annotation was performed on the polished assembly.RepeatModeler v2.0.1 used this genome to construct a de novo repeat library 112 .The obtained repeat sequences were clustered using CD-HIT-EST v4.8.1 with sequence identity as 90% and 8 bp seed size 113 .Repeat library obtained from RepeatModeler was further curated using TEclass2 software with a probability threshold of > 0.65 114 .Using this curated repeat library, Repeat-Masker v4.1.0(RepeatMasker Open-4.0,http:// www.repea tmask er.org) soft-masked the genome that was used for the construction of the gene set.MAKER pipeline that employs ab initio-based gene prediction programs and evidence-based approaches to predict the final gene model was used for genome annotation 115 .As empirical evidence in the MAKER pipeline, the de novo transcriptome assembly of T. cordifolia and protein sequences of its phylogenetically closer species were used.The ab initio gene prediction, evidence-based alignments, and polishing of alignments were achieved using AUGUSTUS v3.2.3, BLAST and Exonerate v2.2.0, respectively, with the MAKER pipeline 116,117 .The coding genes were filtered based on the criteria of AED (Annotation Edit Distance) value < 0.5 and gene length ≥ 150 bp 118 .This coding gene set was annotated against NCBI-nr and Swiss-Prot database using blastp (e-value 10 -5 ), and against Pfam-A database using HMMER v3.1 (e-value 10 -5 ) 116,[119][120][121] .The completeness of genome annotation was evaluated with BUSCO v5.4.3 which used viridiplantae_odb10 database for the assessment 109 .de novo tRNAs prediction, de novo rRNAs prediction, and miRNAs identification (homology-based) were performed using tRNAscan-SE v2.0.7,Barrnap v0.9 (https:// github.com/ tseem ann/ barrn ap) and miRBase database, respectively [122][123][124] .
The orthogroups were formed using the proteome files of 27 selected eudicot species along with Zea mays and MAKER retrieved protein sequences of T. cordifolia.Among all the protein sequences, the longest isoforms were retrieved for all the species and provided to OrthoFinder v2.5.4 for orthogroups construction 130 .The orthogroups comprising genes from all 29 species were retrieved from all the identified orthogroups.KinFin v1.1 was used to increase the genes in one-to-one orthogroups that identified and extracted fuzzy one-to-one orthogroups among Vol:.(1234567890 www.nature.com/scientificreports/these retrieved orthogroups 131 .In cases where multiple genes were present for a single species in any orthogroup, the longest gene among them was selected as representative.MAFFT v7.310 was used to discretely align all the identified fuzzy one-to-one orthogroups to construct the phylogenetic tree 132 .The multiple sequence alignments were trimmed to eradicate empty sites, and the alignments were concatenated using BeforePhylo v0.9.0 (https:// github.com/ qiyun zhu/ Befor ePhylo).The concatenated alignments were used by RAxML v8.2.12, based on a rapid hill climbing algorithm, to create the maximum likelihoodbased phylogenetic tree (100 bootstrap values and amino acid substitution model 'PROTGAMMAAUTO') 133 .Similarly, another phylogenetic tree was constructed using a basal angiosperm species i.e., Amborella trichopoda as an outgroup instead of Zea mays.Additionally, MCMCtree was implemented to estimate the time of divergence between species with -nsample 1,000,000 and -burnin 10,000, and two calibration point was taken between basal eudicots-eudicots divergence (126-132 MYA) and between monocot-eudicot (142-163 MYA) from TimeTree database v5 43,134 .

Gene family expansion and contraction analysis
The proteome files from 29 selected species and the above-generated phylogenetic tree were used to analyse the evolution of gene families using CAFÉ v5 135 .The proteome files provided contained the longest isoform of each protein for all 29 species.Using TimeTree database v5.0, the divergence time between T. cordifolia and Beta vulgaris was obtained.This obtained time was used as a calibration point for adjusting the phylogenetic tree to an ultrametric species tree 43 .These protein sequences were aligned using blastp in all-versus-all mode 116 .The clustering of blastp results was performed using MCL v12-137 136 .Gene families with genes from < 2 species of the specified clades and ≥ 100 gene copies for ≥ 1 species were eliminated.Using the two-lambda (λ) model, the obtained gene families and the ultrametric species tree were used to evaluate the gene family contraction and expansion, where λ indicates a random birth-death parameter.Those with > 10 genes were considered as highly expanded or contracted gene families among the obtained gene families.The resultant highly contracted and expanded gene families were annotated using eggNOG-mapper v2.1.9 127.Further, the functional annotation of these highly expanded or contracted gene families were check manually on the Kyoto Encyclopaedia of Genes and Genomes (KEGG) database 137 .

Genes with signatures of adaptive evolution
The above mentioned six Ranunculales species were used for identification of genes with evolutionary signatures.OrthoFinder v2.5.4 was used to construct orthologous gene sets using the proteome files from six selected species 130 .Orthogroups containing protein sequences from all six species were taken further for analysis.In case of more than one protein sequence occurrence for a species, the longest isoform was retained for analysis.
Genes with high nucleotide divergence MAFFT v7.310 was used to individually align the orthogroups across six species 132 .These alignments were used by RAxML v8.2.12 to create orthogroups specific phylogenetic tree with the 'PROTGAMMAAUTO' amino acid substitution model and 100 bootstrap value 133 .The root-to-tip branch length distance was calculated using the R package "adephylo" for every gene of all six species in the phylogenetic trees 139 .The genes of T. cordifolia with relatively higher root-to-tip branch length distance values were extracted and considered as genes with higher evolution rates or nucleotide divergence.

Genes with unique amino acid substitutions with functional impact
Multiple sequence alignments were used to identify the protein sequences with identical amino acid positions in all species except T. cordifolia.The gaps and 10 positions around the gaps in the alignments were not considered for this analysis.Sorting Intolerant From Tolerant (SIFT) with UniProt database was used to evaluate the functional impact of extracted sequences with amino acid substitutions.The obtained genes were considered genes with unique amino acid substitutions with functional impact 140 .

Positively selected genes
The nucleotide sequences of all orthogroups across selected six species were used by MAFFT v7.310 to obtain individual alignments 132 .The obtained alignments in PHYLIP format along with species phylogenetic tree were used by PAML v4.9a with "codeml" program 134 .PAML v4.9a utilized the branch-site model to identify positively selected genes.Likelihood-ratio tests were performed on these PAML-identified genes and genes with < 0.05 FDR corrected p-values were referred to as positively selected genes of T. cordifolia.The identification of positively selected codon sites was performed using Bayes Empirical Bayes (BEB) analysis with > 95% probability criteria for the foreground lineage.These positive selected genes were functionally annotated and assigned to Gene Ontology (GO) categories and KEGG pathways using WebGestalt web server and KAAS v2.1, respectively 126,128  Genes with all three signatures of adaptive evolution The three evolutionary signatures of adaptation considered were higher evolution rate, positive selection, and unique amino acid substitutions with functional impact.T. cordifolia genes that indicated two of the three evolutionary signatures were suggested as genes with multiple signatures of adaptive evolution (MSA) 118,124,[141][142][143] .The T. cordifolia genes showing all three signatures were considered genes with three signatures of adaptive evolution.The eggNOG-mapper v2.1.9was used to assign KEGG Orthology (KO) and KEGG pathway to the gene sets showing a higher rate of evolution, unique amino acid substitutions with functional impact, positive selection and three signatures of adaptive evolution 127 .The KEGG database was used to retrieve the functional annotation of gene sets from the comparative evolutionary analysis 137 .
Benzyl iso-quinoline alkaloid (BIA) biosynthesis pathway The protein sequences of genes involved in BIA biosynthesis for Papaver somniferum or Coptis chinensis were downloaded from UniProt database 144 .The BIA biosynthetic genes were checked for presence in T. cordifolia using blastp 116 .The exon-intron structures of these BIA biosynthetic genes in T. cordifolia and other five Ranunculales species were constructed using Exonerate v2.4.0 (https:// github.com/ natha nweeks/ exone rate).The distant orthologs of 15 genes involved in BIA biosynthesis pathway were identified to understand their origin.The distant orthologs were identified by HHblits web server using UniRef30 database and default parameters 143,145 .The top 20 hits were extracted considering unique genus for each of 15 genes.These 20 hits along with T. cordifolia sequence were aligned for each of the 15 genes using MAFFT v.7.310 132 .RAxML v8.2.12 with 100 bootstrap value and 'PROTGAMMAAUTO' amino acid substitution model was used to construct the phylogenetic trees for each of the genes using the obtained alignments 133 .

Other secondary metabolites biosynthesis pathway
The protein sequences of genes involved in terpenoid, lignin (backbone pathway for lignans biosynthesis), and flavonoid biosynthesis pathway for Ranunculales order members were obtained from UniProt database.The obtained genes were searched in T. cordifolia using blastp 116 .Among these T. cordifolia genes, these genes with signatures of adaptive evolution were identified.

Analysis of Common symbiosis signalling pathway genes
For CSSP, the genes reported in MacLean et al. (2017) were found in UniProt database and checked for their presence in T. cordifolia using blastp 116,144,146 .The genes were manually checked for signatures of adaptive evolution in T. cordifolia.

Analysis of adventitious root formation genes
The genes reported by Li et al. (2021) were obtained from Uniprot database and checked for their presence in T. cordifolia using blastp 116,144,147 .The genes were manually checked for signatures of adaptive evolution in T. cordifolia.

Gene expression analysis
Kallisto was used to quantify the transcripts of T. cordifolia 148 .The TPM (Transcripts Per Million) values of genes involved in BIA biosynthesis, terpenoid biosynthesis, lignin biosynthesis, flavonoid biosynthesis, CSSP, formation of adventitious root formation, and genes showing all three signatures of adaptive evolution were analyzed.

Figure 1 .
Figure 1.Summary of T. cordifolia genome assembly and genes.(a) Summary of generated data and genome assembly (b) Statistical summary of mapped genes of T. cordifolia.

Figure 2 .
Figure 2. Phylogenetic tree showing the position of T. cordifolia with other species.The phylogenetic tree includes 26 core eudicot species, two basal eudicot species (T.cordifolia and Papaver somniferum) and a monocot species.

Figure 3 .
Figure 3. Comparative analysis of T. cordifolia genome.(a) Phylogeny of six Ranunculales species including T. cordifolia (b) Venn diagram of ortholog gene clustering of six species (c) Biological processes of T. cordifolia specific genes (d) Cellular component of T. cordifolia specific genes (e) Molecular functions of T. cordifolia specific genes Figs. 3a and 3b are obtained from Orthovenn3 138 .